Skip to content

Migrate Benchmark#731

Open
faridyagubbayli wants to merge 2 commits into
masterfrom
migrate-benchmark
Open

Migrate Benchmark#731
faridyagubbayli wants to merge 2 commits into
masterfrom
migrate-benchmark

Conversation

@faridyagubbayli
Copy link
Copy Markdown
Collaborator

@faridyagubbayli faridyagubbayli commented May 14, 2026

Migrates benchmark.m script with some additions.

benchmarks/README.md shows sample usage.

Example output json file
{
  "comp_size": [
    32768,
    65536,
    131072,
    262144,
    524288
  ],
  "comp_time": [
    2.813780007263025,
    2.5228265166903534,
    2.5327161628132067,
    4.796307735455533,
    10.499971745846173
  ],
  "options": {
    "data_cast": "off",
    "heterogeneous_media": true,
    "absorbing_media": true,
    "nonlinear_media": false,
    "binary_sensor_mask": true,
    "number_sensor_points": 100,
    "number_time_points": 1000,
    "num_averages": 3,
    "start_size": 32,
    "x_scale_array": [
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8,
      16,
      16
    ],
    "y_scale_array": [
      1,
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8,
      16
    ],
    "z_scale_array": [
      1,
      1,
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8
    ],
    "domain_size": 0.022,
    "sensor_radius": 0.01,
    "pml_size": 10,
    "pml_inside": true,
    "report_mem_usage": false,
    "backend": "python",
    "device": "gpu",
    "computer_name": "k-instance",
    "python_version": "3.11.14",
    "platform": "Linux-5.10.0-39-cloud-amd64-x86_64-with-glibc2.31",
    "kwave_python_version": "0.6.1"
  },
  "output_path": "<filename>",
  "error_reached": false,
  "error_message": ""
}

Greptile Summary

This PR ports MATLAB's benchmark.m into a new benchmarks/ package, adding a BenchmarkOptions dataclass, a run() function with injected solver/timer for testability, incremental JSON persistence, and a CLI entry point. The implementation is well-structured with good fault-tolerance design (partial saves after each average).

  • benchmarks/helpers.py defines the full domain setup (build_case), result accumulation (store_case_result, rolling_average), and JSON serialization (save_results).
  • benchmarks/benchmark.py wires everything into a run() loop and a main() CLI; the solver and timer are injectable, making unit tests fast and hermetic.
  • tests/test_benchmark.py provides solid coverage of the happy path and error path, but does not exercise the report_mem_usage=True branch.

Confidence Score: 4/5

Safe to merge for the typical Linux/macOS benchmark use-case; the nan-serialization defect only bites when --report-mem-usage is passed on Windows.

When report_mem_usage=True is used on Windows, peak_memory_bytes() returns float(nan), which json.dumps writes as the bare token NaN — producing a file that Python's own json.loads rejects. The rest of the benchmark logic (timing, grid construction, incremental saves, error handling) looks correct and is well-tested.

benchmarks/helpers.py — specifically the peak_memory_bytes function and its unchecked nan return value

Important Files Changed

Filename Overview
benchmarks/helpers.py Core helper module; peak_memory_bytes returns nan on Windows which propagates to an invalid JSON output file when report_mem_usage=True
benchmarks/benchmark.py Main benchmark runner with injectable solver/timer; inherits the nan-propagation issue from helpers
tests/test_benchmark.py Covers grid sizes, build_case, timing aggregation, and error handling, but does not exercise the report_mem_usage code path
benchmarks/init.py Package marker with a single docstring; no logic
benchmarks/README.md Usage documentation for the benchmark; accurate and complete

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLI["main() / CLI args"] --> RUN["run(options, backend, device, ...)"] 
    RUN --> GS["grid_sizes(options)"]
    GS --> LOOP["for each (nx, ny, nz, scale)"]
    LOOP --> BC["build_case(options, nx, ny, nz, scale)"]
    BC --> AVG["for loop_num in 1..num_averages"]
    AVG --> SOLVE["solver(kgrid, medium, source, sensor, ...)"]
    SOLVE --> TIME["elapsed = timer() - start"]
    TIME --> RA["rolling_average(loop_time, elapsed, loop_num)"]
    RA --> MEM{"report_mem_usage?"}
    MEM -- yes --> PMB["peak_memory_bytes() - returns nan on Windows"]
    MEM -- no --> SCR
    PMB --> SCR["store_case_result(result, nx*ny*nz, ...)"]
    SCR --> SAVE["save_results(path, result)"]
    SAVE --> AVG
    LOOP --> EXC{"Exception?"}
    EXC -- yes --> ERR["set error_reached=True, save_results, break"]
    EXC -- no --> LOOP
    RUN --> RET["return result dict"]
Loading

Reviews (2): Last reviewed commit: "helpers" | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.93%. Comparing base (4b110af) to head (968b5f8).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #731      +/-   ##
==========================================
+ Coverage   74.82%   74.93%   +0.11%     
==========================================
  Files          56       56              
  Lines        8095     8095              
  Branches     1577     1577              
==========================================
+ Hits         6057     6066       +9     
+ Misses       1422     1411      -11     
- Partials      616      618       +2     
Flag Coverage Δ
3.10 74.91% <ø> (+0.11%) ⬆️
3.11 74.91% <ø> (+0.11%) ⬆️
3.12 74.91% <ø> (+0.11%) ⬆️
3.13 74.91% <ø> (+0.11%) ⬆️
macos-latest 74.86% <ø> (+0.11%) ⬆️
ubuntu-latest 74.86% <ø> (+0.11%) ⬆️
windows-latest 74.87% <ø> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread benchmarks/benchmark.py Outdated
Comment on lines +203 to +213
def _store_case_result(result: dict[str, Any], comp_size: int, comp_time: float, mem_usage: float, report_mem_usage: bool) -> None:
if len(result["comp_size"]) == 0 or result["comp_size"][-1] != comp_size:
result["comp_size"].append(comp_size)
result["comp_time"].append(comp_time)
if report_mem_usage:
result["mem_usage"].append(mem_usage)
return

result["comp_time"][-1] = comp_time
if report_mem_usage:
result["mem_usage"][-1] = mem_usage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Case identity key is not unique for custom scale arrays

_store_case_result identifies a grid case by nx * ny * nz (total grid points). If two entries in the scale arrays produce the same total point count but different shapes — e.g. (2, 1) / (1, 2) with z=(1, 1) both giving 2 * start_size³ — the second case's timing will overwrite the first rather than append a new row. The default scale arrays happen to produce unique products, so this is silent with the defaults, but custom arrays passed via BenchmarkOptions can trivially trigger it and silently drop a case's result.

Comment thread benchmarks/benchmark.py Outdated
Comment on lines +246 to +255
def _peak_memory_bytes() -> float:
try:
import resource
except ImportError:
return float("nan")

usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
if platform.system().lower() == "darwin":
return float(usage)
return float(usage * 1024)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ru_maxrss reports cumulative process-lifetime peak, not per-run peak

resource.getrusage(RUSAGE_SELF).ru_maxrss returns the highest RSS the process has ever reached since it started, not the peak during the current solver call. When report_mem_usage=True and the benchmark runs multiple grid sizes, every case after the largest one seen so far will record the earlier run's peak — the values will plateau rather than reflect per-run memory. Averaging this monotonically non-decreasing value further masks the issue.

Comment thread benchmarks/benchmark.py Outdated
Comment on lines +61 to +64
if self.number_time_points <= 0 or self.num_averages <= 0:
raise ValueError("number_time_points and num_averages must be positive")
if self.number_sensor_points <= 1:
raise ValueError("number_sensor_points must be greater than 1")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 start_size is not validated

A start_size of 0 or a negative value passes __post_init__ silently, producing (0, 0, 0, 0) tuples from grid_sizes and a zero-division error inside build_case when computing dx = domain_size / nx.

Suggested change
if self.number_time_points <= 0 or self.num_averages <= 0:
raise ValueError("number_time_points and num_averages must be positive")
if self.number_sensor_points <= 1:
raise ValueError("number_sensor_points must be greater than 1")
if self.number_time_points <= 0 or self.num_averages <= 0:
raise ValueError("number_time_points and num_averages must be positive")
if self.start_size <= 0:
raise ValueError("start_size must be positive")
if self.number_sensor_points <= 1:
raise ValueError("number_sensor_points must be greater than 1")

Comment thread tests/test_benchmark.py
Comment on lines +95 to +109
def test_run_saves_partial_results_after_solver_error(tmp_path: Path):
options = small_options()
output_path = tmp_path / "benchmark.json"

def failing_solver(*args, **kwargs):
raise RuntimeError("solver failed")

result = run(options, max_cases=1, output_path=output_path, solver=failing_solver)

assert result["error_reached"] is True
assert result["error_message"] == "solver failed"
assert output_path.exists()
saved = json.loads(output_path.read_text())
assert saved["error_reached"] is True
assert saved["error_message"] == "solver failed"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 report_mem_usage=True path has no test coverage

The mem_usage list — conditionally added to result and to the JSON payload — is never exercised by the test suite. The _peak_memory_bytes helper, the rolling-average accumulation of loop_mem_usage, and the _save_results branch that serialises mem_usage are all untested.

Comment thread benchmarks/helpers.py
Comment on lines +169 to +178
def peak_memory_bytes() -> float:
try:
import resource
except ImportError:
return float("nan")

usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
if platform.system().lower() == "darwin":
return float(usage)
return float(usage * 1024)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 nan sentinel silently produces unreadable JSON on Windows

peak_memory_bytes() returns float("nan") on Windows where the resource module isn't available. That nan flows unguarded through rolling_average and store_case_result into result["mem_usage"]. Python's json.dumps serialises nan as the bare token NaN (not valid JSON), and Python's own json.loads will then reject the file with a ValueError — so any run with --report-mem-usage on Windows silently writes a file that cannot be read back.

Consider raising early (ValueError: report_mem_usage is not supported on this platform) when resource is unavailable, or filtering/skipping the nan entry in save_results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant